Evaluating urine volume and host depletion methods to enable genome-resolved metagenomics of the urobiome

Background: The gut microbiome has emerged as a clear player in health and disease, in part by mediating host response to environment and lifestyle. The urobiome (microbiota of the urinary tract) likely functions similarly. However, efforts to characterize the urobiome and assess its functional potential have been limited due to technical challenges including low microbial biomass and high host cell shedding in urine. Here, to begin addressing these challenges, we evaluate urine sample volume (100 ml – 5 mL), and host DNA depletion methods and their effects on urobiome profiles in healthy dogs, which are a robust large animal model for the human urobiome. We collected urine from seven dogs and fractionated samples into aliquots. One set of samples was spiked with host (canine) cells to model a biologically relevant host cell burden in urine. Samples then underwent DNA extraction followed by 16S rRNA gene and shotgun metagenomic sequencing. We then assembled metagenome assembled genomes (MAGs) and compared microbial composition and diversity across groups. We tested six methods of DNA extraction: QIAamp BiOstic Bacteremia (no host depletion), QIAamp DNA Microbiome, Molzym MolYsis, NEBNext Microbiome DNA Enrichment, Zymo HostZERO, and Propidium Monoazide. Results: In relation to urine sample volume, 3 3.0 mL resulted in the most consistent urobiome profiling. In relation to host depletion, individual (dog) but not extraction method drove overall differences in microbial composition. DNA Microbiome yielded the greatest microbial diversity in 16S rRNA sequencing data and shotgun metagenomic sequencing data, and maximized MAG recovery while effectively depleting host DNA in host-spiked urine samples. As proof-of-principle, we then mined MAGs for core metabolic functions and environmental chemical metabolism. We identified long chain alkane utilization in two of the urine MAGs. Long chain alkanes are common pollutants that result from industrial combustion processes and end up in urine. Conclusions: This is the first study, to our knowledge, to demonstrate environmental chemical degradation potential in urine microbes through genome-resolved metagenomics. These findings provide guidelines for studying the urobiome in relation to sample volume and host depletion, and lay the foundation for future evaluation of urobiome function in relation to health and disease.


Introduction
Alterations in the urobiome (microbiota of the urinary tract) have been associated with bladder cancer [1,2], incontinence [3], urinary tract infection [4,5], and urolithiasis [6,7], but study of the urobiome remains challenging.Urine culturing is commonly employed to identify microbes and microbial functions (e.g., antimicrobial resistance) present in urine.However, standard urine culture captures very few members of the urobiota].More recently, expanded quantitative urine culturing methods (EQUC) have improved culture resolution, but many urobiota remain uncultured, highlighting the need for effective cultureindependent methods to pro le the urobiota [8].Critically, sequencing based studies of the urobiome are also fraught with technical challenges.
First, urine generally contains low microbial biomass [9], making urine samples vulnerable to contamination by microbes or microbial DNA introduced during extraction or sequencing [10,11].
Additionally, there are no evidence-based guidelines on minimum urine volumes for microbiome research, and studies on the urobiome range from using 0.5 mL [12] to 50 mL of urine [13].Importantly, there are conditions (e.g., urinary tract in ammation), populations (e.g., pediatric), and model species (e.g., dogs, rodents) for which collecting 10 mL of urine or more in a single void may be infeasible.
Finally, urine can contain a high burden of host cells, especially in diseased states such as urinary tract infection or bladder cancer [14][15][16], which can complicate DNA extraction, introduce noise in 16S rRNA pro ling [17] and overwhelm shotgun sequencing attempts with host reads rather than microbial reads.This then limits our ability to understand the functional potential of the urobiome and how these functions drive health and disease.
Commercial DNA extraction methods and published protocols that include host cell and DNA depletion are available, but these methods have not been comparatively evaluated in urine.In this study, we assess four commercially available DNA extraction kits that include host DNA depletion (MolYsis Complete5; NEBNext Microbiome DNA Enrichment Kit; QIAamp DNA Microbiome Kit; and Zymo HostZERO) as well as a protocol using light-activated propidium monoazide, and compare them to a method with no host depletion (BiOstic Bacteremia).Host depletion has been successful in other low-microbial-biomass, high-host-biomass substrates including breast milk, oral, respiratory tract, and tumor samples [18][19][20][21][22].
For example, in saliva samples, two host depletion methods reduced the host read proportion from 95% to < 30%, thereby improving the microbial resolution of shotgun metagenomics [19].Host depletion methods offer promise for improving characterization of urobiome structure and function, but require evaluation for e cacy in urine samples.
The urobiome has been characterized via culture, whole genome sequencing of urine isolates, 16S rRNA gene sequencing, and shotgun metagenomic sequencing.However, few studies have reported metagenome-assembled microbial genomes (MAGs) and genome-resolved community analyses of the urobiota [23][24][25][26].Bioinformatic construction of MAGs from urine would allow for more thorough functional reconstruction of the urobiome, including rare and unculturable taxa, revealing potentially important mechanistic links between the urobiome and disease in a genome-resolved fashion [27,28].
In this study, we tested several approaches for studying the urobiome using urine from healthy dogs.Dogs are a robust translational model for the human urobiome [1,[29][30][31] and for urinary tract diseases, including bladder cancer [32] and urinary tract infection [29][30][31].We speci cally set out to i) assess the impact of urine sample volume on urine microbial community pro les (Fig. S1), ii) determine how DNA extraction methods that include host depletion affect urobiome pro les (16S rRNA and shotgun metagenomics) (Fig. S2), iii) determine if we could su ciently reconstruct MAGs of urine microbes from shotgun metagenomic data to then mine them for relevant microbial functions, and iv) assess if and how urine microbes metabolize environmental chemicals linked with urinary tract diseases like bladder cancer.

Urine Volume Experiment
The goal of this rst experiment was to determine if microbial community pro les and the presence/abundance of microbial contaminants (e.g. from reagents, kits, etc.) differed by urine sample volume (Experimental Design: Fig. S1).

Subject Recruitment
Healthy dogs were recruited through the Ohio State University Veterinary Medical Center (IACUC: 2020A00000050).Each dog underwent a comprehensive physical exam, blood work (serum chemistry, complete blood count), urinalysis, and urine culture.All dogs were between one and ten years of age, weighed at least 20 lb with a body condition score of 4 or 5 (out of 9) and normal muscle condition.Dogs with a history, physical examination ndings, clinical signs, or laboratory abnormalities consistent with urinary tract, liver, kidney, or gastrointestinal disease were excluded.Dogs with any history of antibiotic use, chemotherapy, or radiation in the past three months were also excluded (Table S1).

Urine Sample Collection & Preparation
Midstream, free-catch urine was collected and stored from 5 healthy dogs as described previously [33].
Samples were centrifuged at 4°C and 20,000g for 30 minutes.Following centrifugation, supernatant was discarded, and the pellet was saved.The pellets were then used for DNA extractions.

DNA Extraction & Quanti cation
DNA was extracted using the QIAamp BiOstic Bacteremia DNA Kit (Bacteremia; Qiagen, Hilden, Germany), as described previously [34].This kit does not include host depletion steps.Brie y: pellets were resuspended in a lysis buffer and underwent two rounds of bead beating at 6m/s for 60s in an MP FastPrep-24 5G (MP Biomedicals, Solon, OH).Following bead beating, samples were cleaned using the kit's inhibitor removal solution and processed according to manufacturer protocol.All centrifugation steps were conducted at 13,000 x g and, in the nal step, samples were eluted twice through the silica membrane to maximize DNA yield.DNA concentrations were quanti ed using a Qubit Fluorometer (ThermoFisher Scienti c, Waltham, MA).

16S rRNA Gene Library Preparation and Sequencing
DNA then underwent library preparation and sequencing at Argonne National Laboratory (Lemont, IL), as described previously [34].Brie y: we used primers 515F and 806R to amplify the V4 region of the 16S rRNA gene, followed by paired-end amplicon sequencing via Illumina Miseq (2x250).Sequences are available at NCBI Bioproject PRJNA1109516.

16S rRNA Gene Sequence Processing and Statistical Analyses
Raw sequences were processed using QIIME2 v.2023-5.Reads were denoised and clustered into amplicon sequence variants (ASVs) using DADA2 [35].with the following parameters: 5 base pairs (bp) were trimmed from the 5' end of each read and forward reads were truncated at 225 bp while reverse reads were truncated at 220 bp.Putative contaminant reads were identi ed and removed using the R package decontam [36] with prevalence-based ltering (threshold = 0.5) (Table S2).Microbial contaminants are microbes or microbial sequences that get introduced during the extraction, library preparation, or sequencing process.These contaminants are putatively identi ed based on their tendency to be more prevalent or abundant in negative control samples (n = 7).Contaminant read counts were exported into a new table for analysis.Contaminant abundances were calculated by dividing each count by the total number of 16S rRNA reads in each sample, and contaminant abundances between groups were statistically compared using the Friedman test.Taxonomy was assigned using the Silva 138 99% OTU 515F/806R classi er.Unassigned sequences and sequences assigned to mitochondria or eukaryotes were removed.A total of 37 samples were sequenced, and sequencing depth (including negative controls) ranged from 1-30,408 reads.Samples with fewer than 4,125 reads were excluded from analyses, and remaining samples were rare ed to this depth.This excluded all negative controls and 6 true samples which were largely low volume samples (3 samples from dog ArB (0.1 ml, 0.5 ml, 3 ml), 1 sample from dog MS (0.1 ml), and two samples from dog FC (0.1 ml, 0.5 ml).
For all analyses, statistical signi cance was set at p < 0.05.Microbial diversity (Shannon Index, Observed Features, and Faith's Phylogenetic Diversity) and distance metrics (Bray Curtis, Jaccard, and UniFrac) were calculated and tested using QIIME2 and the R packages phyloseq and vegan.Differences in bacterial diversity were assessed via t-test, Friedman test, or Kruskal-Wallis test depending on the normality and pairing of the data, and pairwise comparisons were conducted using the Benjamini, Krieger, and Yekutieli procedure for controlling the false discovery rate (FDR) at Q = 0.05 [37].Differences in microbial composition were assessed via PERMANOVA, with Q = 0.05 for FDR adjustments in pairwise comparisons.

Host Depletion -16S rRNA Gene
The goal of this experiment was to evaluate how DNA extraction methods that include host depletion steps affected bacterial DNA recovery and microbial community pro les (Experimental Design: Fig. S2).Subject recruitment occurred as described above.

Urine Sample Collection, Host Cell Spiking, & DNA Extraction
We collected mid-stream free catch urine from seven healthy dogs (Table S1).Urine was then aliquoted into two batches: one batch was spiked with canine cells (canine thyroid adenocarcinoma cells [38] -CTAC) to a concentration of 75,000 cells/mL, to model a biologically relevant host cell concentration in urine from healthy dogs [39].The other batch remained unspiked.Urine samples were then pelleted as described above.All urine samples then underwent DNA extraction using six different extraction methods: QIAamp BiOstic Bacteremia DNA Kit (Bacteremia; Qiagen, Hilden, Germany); MolYsis Complete5 (Molzym, Bremen, Germany); NEBNext Microbiome DNA Enrichment Kit (New England Biolabs, Ipswich, MA); QIAamp DNA Microbiome Kit (Qiagen, Hilden, Germany); HostZERO Microbial DNA Kit (Zymo Research, Irvine, CA); and a protocol using light-activated propidium monoazide described in Marotz et al., 2018 [19] All of these methods except QIAamp Bacteremia included host depletion steps.In addition to urine samples, we also included a positive control sample (ZymoBIOMICS Gut Microbiome Standard, Zymo Research, Irvine, CA, Table S3) that we extracted with each method.The ZymoBIOMICS gut microbiome standard contains 21 microbial strains, including 18 bacteria, 2 microbial eukaryotes, and one archaeaon.Samples were extracted according to the respective manufacturers' protocol, with modi cations described below.Each extraction also included a negative control (blank) that underwent extraction and sequencing along with all the true samples (n = 6).All extracted DNA was stored at -80°C until library preparation and sequencing.Unspiked samples underwent 16S rRNA gene sequencing; spiked samples underwent shotgun metagenomic sequencing (described under Host Depletion -Shotgun Metagenomics).

QIAamp BiOstic Bacteremia (Qiagen)
No host depletion is included in this protocol.Protocol is detailed above under Urine Volume Experiment.For the ZymoBIOMICS Gut Microbiome Standard, prior to extraction, the standard was centrifuged at 20,000g for 10 minutes.The supernatant and pellet were then separated, and both were saved.Per recommendations from Zymo, the pellet was then processed through the Bacteremia kit following the manufacturer's protocol.In the nal step, the supernatant was added to the MB spin column (along with the pellet lysate) and centrifuged at 13,000g for 1 minute.This captured any additional DNA from the supernatant on the spin column

Molzym MolYsis Complete 5 (Molzym MoYsis)
This method uses a chaotropic buffer to selectively lyse host cells then removes host DNA using a DNAase prior to extracting microbial DNA.Samples were extracted following the manufacturer's protocol.

NEBNext Microbiome DNA Enrichment Kit (NEBNext)
This method uses nonselective lysis followed by selective binding and depletion of CpG-methylated host DNA in order to enrich microbial DNA recovery.Samples were rst extracted using the QIAamp BiOstic Bacteremia DNA Kit and frozen at -80°C.Samples were defrosted and further extraction was performed according to the NEBNext manufacturer's protocol.For samples that did not have detectable DNA after the initial extraction, a threshold of 0.05 ng/µL was used to calculate the MBD2-Fc Protein to Protein A magnetic bead value.A solution of MBD2-FC Protein and Protein A magnetic beads was prepared and aliquoted into each sample accordingly.To avoid DNA loss, puri cation was not performed at the end of the protocol (neither option A nor B).
QIAamp DNA Microbiome Kit (DNA Microbiome) This method uses selective osmotic lysis and Benzonase to degrade host cells and digest host DNA prior to extraction of microbial DNA.A Thermomixer at 600 rpm was used instead of end-over-end rotation.Prior to extraction, the ZymoBIOMICS Gut Microbiome Standard was centrifuged at 20,000g and the supernatant was saved.To maximize DNA recovery per recommendations from Zymo, the pellet was processed through the kit, and the supernatant was added to the MB spin column and centrifuged at 13,000g for 1 minute.The ow-through was discarded, and lysate from the pellet was added per the manufacturer's protocol at step 12.
Zymo HostZERO Microbial DNA Kit (Zymo) This method uses selective osmotic lysis followed by enzymatic degradation of DNA to degrade host cells and host DNA prior to extraction of microbial DNA.A FastPrep-24 5G bead beater was used for optimized lysis (Appendix D of manufacturer's protocol).Extraction proceeded following the manufacturer's protocol.Samples were eluted with 20-26 uL ZymoBIOMICs DNase RNase-Free Water.

Propidium Monoazide (PMA)
This method uses PMA to intercalate the DNA of membrane-disrupted host cells, and light activation triggers covalent bonding between dsDNA and PMA, fragmenting the DNA.Samples were pretreated with 10uM PMA as described in Marotz et al. (2018), beginning with resuspending urine pellets in 200uL sterile water.After PMA treatment, samples were stored at -20°C and then extracted using the Qiagen QIAamp BiOstic Bacteremia kit.

DNA Quanti cation and 16S rRNA Gene Sequencing
In both spiked and unspiked samples, we quanti ed total DNA via Qubit uorometer and bacterial DNA via qPCR using universal 16S rRNA gene bacterial primers as described previously [34,40].Bacterial concentrations were compared between groups using either Friedman tests or Kruskal-Wallis tests.
Finally, we analyzed microbial community pro les (16S rRNA gene sequencing) in each sample.Library preparation, sequencing, decontamination (Table S4), and analysis were conducted as described above in the Urine Volume Experiment with the following DADA2 parameters: 5 bp were trimmed from the 5' end of each read and forward reads were truncated at 250 bp while reverse reads were truncated at 231 bp.One urine sample (Dog SJ, Extraction Method: Molzym Molysis) appeared to be cross-contaminated with DNA from the ZymoBIOMICS Gut Microbiome Standard and was excluded from analysis (Fig S3).Statistical analyses were performed as described above (see Urine Volume Experiment) to assess differences by extraction method.

16S rRNA Gene Sequence Processing and Statistical Analyses
16S rRNA gene sequencing processing and statistical analyses were performed as described above (see Urine Volume Experiment) to assess differences in microbial community diversity and composition by extraction method.

Host Depletion -Shotgun Metagenomics
The goal of this experiment was to assess host depletion by extraction method and the viability of performing genome-resolved metagenomics on low biomass urine samples.To do this, we used the same urine samples and ZymoBIOMICs Gut Microbiome Standard positive control from the Host Depletion -16S rRNA Gene experiment described above and spiked them with host (canine) cells (Experimental Design: Fig. S2).Spiking samples with an equal concentration of host cells allowed us to best assess the host DNA depletion e cacy of each method.DNA was extracted using the same 5 methods as described above under Host Depletion − 16S rRNA Gene (Bacteremia, DNA Microbiome, Molzym MolYsis, Propidium Monoazide, and Zymo HostZERO).NEBNext sequences did not pass quality control in 16S rRNA gene sequencing; we therefore excluded these samples from shotgun metagenomic sequencing.

Shotgun Metagenomic Library Preparation and Sequencing
Samples underwent shotgun metagenomic sequencing at the Ohio State University Infectious Diseases Institute -Genomics and Microbiology Solutions (IDI-GEMS) Laboratory.Metagenomic libraries were prepared following the Illumina (San Diego, CA) DNA Library Prep protocol with the following modi cations: 1) Illumina's (M) beads were substituted with (L) beads to obtain larger insert sizes, 2) 9 or 12 PCR ampli cation cycles were used based on sample DNA concentration (Qubit) (Fig. S4), and 3) library puri cation was performed using a 1:1 sample to bead ratio.Samples were barcoded using IDT for Illumina UD Indexes.Tagmentation-based library construction has been validated and adopted as a standard operating procedure within the IDI-GEMS Laboratory to characterize the presence of microbes in samples and was recently shown to be an effective repeatable method for microbiome analysis of the human gut [41].Metagenomic libraries were sequenced targeting a minimum of 50 million 2x150 base pair paired-end reads using an Illumina NextSeq2000.Negative extraction (n = 5) and sequencing (n = 2) controls were sequenced along with samples.Sequences were processed using the Ohio Supercomputer [42].Sequences are available at NCBI Bioproject PRJNA1123238.

Metagenomic Sequence Processing and Statistical Analyses
Raw reads from the Illumina sequencer were quality ltered and trimmed of adapters using Trimmomatic [43].Host reads were quanti ed by mapping to a concatenated canine and feline genome with CoverM [44].Reads not assigned host were assumed microbial.Read counts were compared across extraction methods using the Friedman test.Taxonomy and abundance tables for microbial community pro ling of metagenomes were generated using MetaPhlAn4.0 [45] and SingleM [46] and SingleM condense.Metagenomes were de novo assembled into contigs using MEGAHIT [47] and quality assessed with QUAST [48].Contigs were binned into MAGs using portions of the MetaWRAP [49] pipeline, which combines the binning methods MetaBat2 [50], MaxBin2 [51], and CONCOCT [52], and chooses the highest-quality representative of each bin from across these automated methods.dRep [53] was used to dereplicate MAGs at 99% average nucleotide identity, and CheckM [54] was used to evaluate MAGs for completeness and contamination.Only medium (> 70% completion and < 10% contamination) and high (> 95% completion and < 5% contamination) quality MAGs were retained for analysis.GTDB-Tk [55].was used to assign taxonomy to MAGs according to the Genome Taxonomy Database.Abundance tables of MetaPhlAn, SingleM, and MAG pro les were processed using decontam to identify putative contaminants.Because MetaPhlAn generates species-level taxonomic assignments, genera were also manually ltered: taxa commonly identi ed as kit contaminant genera [56] present in at least one negative control sample were bioinformatically removed, even if they were not ltered by decontam (Table S5).Additionally, reads assigned to taxa from the Zymo Gut Microbiome Standard in urine samples pro led with MetaPhlAn or GTDB-Tk were considered putative cross-contaminants and were removed from those samples.Diversity and community composition metrics from metagenomic data as well as read-level statistics were analyzed using the R packages phyloseq [57], vegan [58], and tidyverse [59].Alpha diversity was compared between kits using Friedman tests, and comparisons between dogs were performed using Kruskal-Wallis.Pairwise comparisons were conducted using the Benjamini, Krieger, and Yekutieli procedure for controlling the false discovery rate (FDR) at Q = 0.05.Differences in microbial composition were assessed via PERMANOVA, with Q = 0.05 for FDR adjustments in pairwise comparisons.Genes in MAGs were annotated using DRAM [27].

Hydrocarbon Degradation Pro ling
As a proof-of-principal test, we then mined the MAGs for microbial functions of interest including urea utilization and environmental chemical degradation.These functions are relevant as urine is a urea rich environment, and environmental chemicals, such as polycyclic aromatic hydrocarbons have been associated with urinary tract diseases like bladder cancer.Urea utilization was identi ed by searching within the DRAM output.To identify putative hydrocarbon degrading genes, we queried custom, curated, published Hidden Markov Model (HMM) pro le databases: aerobic degradation of polycyclic aromatic hydrocarbon pathways (PAHp) [60], and markers for the activation of various hydrocarbons (CANT-HYD) [61].Coding genes called by DRAM were queried against these databases using the hmmsearch function of HMMER (version 3.3) [62] and ltered to a maximum expect-value (e-value) of 1e-10.The full scores were compared to the score cutoffs speci c to each gene in the database, i.e., gather cutoffs for PAHp and noise or trusted cutoffs implemented by CANT-HYD.Given the potential for high stringency in pro les generated largely from a few well-characterized model organisms, these cutoffs were relaxed to a minimum of 80% of the gather cutoff and 90% of the noise and trusted cutoffs for the respective databases.

Urine Volume Experiment
Current urobiome studies vary widely in the volume of urine used for pro ling microbial communities.
Moreover, low biomass samples, like urine, are highly susceptible to contamination by microbes or microbial DNA (hereafter referred to as "contaminants") that can be introduced during the DNA extraction and sequencing process.As such, in this experiment, we rst assessed the relationship between urine sample volume and microbial contaminant load.Contaminants, as identi ed by decontam (Table S2), were at signi cantly lower relative abundances in urine samples of greater volume (Fig. 1A, Table S6, p = 0.026, Friedman).
We then evaluated bacterial diversity and composition by urine sample volume.Microbial richness, or the total number of unique ASVs in each sample, increased signi cantly with sample volume (Fig. 1B, S5, Table S7, p = 0.015, Friedman).Sequencing reads also increased with urine sample volume; although, this difference was not signi cant (Fig. 1C, p = 0.075, Friedman).
Bacterial composition, however, did not differ signi cantly by urine sample volume but did differ signi cantly between dogs (Fig. 2A, S5, between dogs: p = 0.001, by urine sample volume: 0.98, Bray-Curtis, PERMANOVA), indicating that inter-dog differences overwhelmed differences based on sample volume.We next evaluated within-dog microbial composition by sample volume.Within each dog, the 3 mL and 5 mL samples were more consistent in microbial composition, while the 0.1, 0.2, 0.5 and 1 mL samples were more variable (Fig. 2B, S6).Based on this pattern, we grouped 3 mL and 5 mL urine samples into a "High" volume group, and the remaining urine volumes into a "Low" volume group.There was no signi cant difference in microbial composition between the High and Low groups (p = 0.6, PERMANOVA); however, High volume samples had signi cantly less variable microbial communities than Low volume samples, indicating that Low volume samples are more subject to stochasticity (Fig. 2C, S6; p = 0.0017, PERMDISP).Based on these results, we proceeded to use 3 mL urine samples for subsequent experiments.

Host Depletion -16S rRNA Gene
Healthy urine contains shed host epithelial cells at a relatively low abundance.However, in the presence of urinary tract disease (e.g., urinary tract infection, bladder cancer, bladder stones), host cell shedding can dramatically increase.There are multiple DNA extraction methods that incorporate host cell / host DNA depletion steps to facilitate microbial DNA recovery.In this experiment, we evaluated how six different extraction methods affected DNA concentrations and microbial community pro les.Extraction methods included: QIAamp BiOstic Bacteremia DNA Kit (Bacteremia); MolYsis Complete5 (Molzym); NEBNext Microbiome DNA Enrichment Kit; QIAamp DNA Microbiome Kit (DNA Microbiome) HostZERO Microbial DNA Kit (Zymo HostZERO); and a protocol using light-activated propidium monoazide described in Marotz et al., 2018 [19].All methods except Bacteremia included host depletion steps.The Bacteremia extraction method was included for reference here because this method has already been validated as an optimal method for pro ling canine urine microbial communitites [34], and it has been applied across multiple urobiome studies in humans and animals [1,4].However, it has not been tested against extraction methods that include host depletion steps, which we did here.
We rst compared how each extraction method impacted total and bacterial DNA concentrations derived from urine samples.We also compared DNA concentrations in urine samples that were unspiked versus those spiked with host (canine) cells.While healthy mid-stream free-catch urine contains a low abundance of host cells, we opted to spike additional canine cells into urine at biologically relevant concentrations to best assess the host depletion capabilities of each extraction method.In unspiked samples, Bacteremia and NEBNext recovered the greatest total DNA concentrations (host + microbial); although, this result was not signi cant (p = 0.62, Friedman, Fig. 3A).Bacteremia, DNA Microbiome, and Molzym MolYsis demonstrated signi cantly greater bacterial DNA recovery than propidium monoazide, Zymo HostZERO, and NebNEXT; although no pairwise comparisons were signi cant (overall p = 0.014, Friedman, Fig. 3B).In spiked urine samples, Bacteremia and NebNEXT recovered signi cantly greater total DNA than all other extraction methods (Fig. 3C, overall p < 0.0001, Friedman, pairwise p between Bacteremia or NebNEXT and all other methods < 0.05), while DNA Microbiome recovered the most bacterial DNA; although, overall differences in bacterial DNA concentrations by extraction method were only marginally signi cant (Fig. 3D, overall p = 0.051, Friedman).There was no signi cant difference in total or bacterial DNA recovery by dog in unspiked or spiked samples (Fig. S7) We next assessed urine microbial diversity (16S rRNA) of unspiked urine samples by extraction method.Sequencing data from all samples extracted using NEBNext did not pass quality control steps [35] and, as such, were excluded from analysis.Urine microbial diversity varied signi cantly by extraction method (Fig. 4, S8, Table S8, Microbial richness p = 0.0018, Shannon Entropy p = 0.0091, Friedman).Speci cally, urine samples extracted using Bacteremia and DNA Microbiome contained the greatest microbial richness (unique ASVs) and signi cantly greater microbial richness than samples extracted using Zymo HostZERO (Fig. 4A, Table S8, overall p = 0.0018, pairwise p = 0.0041, Friedman).Samples extracted via Bacteremia, DNA Microbiome, or propidium monoazide also exhibited the greatest microbial diversity (Shannon Entropy), all three showing signi cantly greater microbial diversity than samples extracted via Molzym MolYsis (Fig. 4B, Table S8, pairwise p = 0.025, 0.028, and 0.017, Friedman, respectively).

Host Depletion -Shotgun Metagenomics
We next assessed host depletion e cacy of each extraction method using shotgun metagenomic sequencing performed on urine samples spiked with host (canine) cells.Samples averaged 28.2 million paired-end reads per sample (range: 1399-80 million reads, SD: 16.7 million reads).There was no signi cant difference in the total number of reads obtained per sample by extraction method (Fig. 5A p = 0.12, Friedman).However, the total number of microbial reads did vary signi cantly by extraction method (Fig. 5B, p = 0.0039, Friedman), with DNA Microbiome, Molzym MolYsis, and Zymo HostZERO yielding a signi cantly greater number of microbial reads compared to Bacteremia, which includes no host depletion steps (all pairwise p = 0.01).The proportion of total microbial reads also varied signi cantly by extraction method with Molzym MolYsis and ZymoHostZERO yielding the greatest proportion of microbial reads (Fig. 5C, overall p < 0.0001, pairwise p < 0.02, Friedman).In terms of host reads, each method yielded the following (on average): Bacteremia, 82% host reads; DNA Microbiome, 78%; Molzym MolYsis, 29%; PMA, 81%; Zymo HostZERO, 30%.Finally, we quanti ed the abundance of contaminant reads by extraction method and found that DNA Microbiome samples contained the lowest abundance of contaminant reads (Fig. 5D, overall p = 0.014, Friedman), although contaminant read abundances varied widely between samples (0-100%).
To determine whether e cacy in host depletion translated to improved capture of the urobiome, we employed MetaPhlaAn4 and SingleM -computational tools used for pro ling microbial communities from marker genes found in metagenomes.Urine microbial diversity varied signi cantly by extraction method (Fig. 6A, B, MetaPhlAn, Observed Species p = 0.011, Shannon entropy p = 0.002, Friedman), with DNA Microbiome yielding the greatest number of observed microbial species and signi cantly more species than all other extraction methods (all pairwise p = 0.014) except Molzym MolYsis.Urine microbial composition did not differ signi cantly by extraction method but did differ signi cantly by dog (Fig. 6C, D, MetaPhlAn4, By extraction method: Jaccard p = 0.67, Bray-Curtis p = 0.96; By dog: Jaccard p = 0.001, Bray Curtis p = 0.001, PERMANOVA), indicating that interindividual variation overwhelmed microbial community differences due to extraction method.SingleM largely recapitulated the MetaPhlAn results (Fig. S9).
We then assessed the viability of performing genome-resolved metagenomics on low biomass urine samples.To do this, we assembled MAGs within each sample (Assembly metrics for each sample: Fig. S10).We generated a total of 26 unique MAGs: 11 were bacteria found in the ZymoBIOMICs Gut Microbiome Standard (Table S3), and ve were derived from urine samples (Fig. 7); 10 were probable contaminants (Table S5).The ve E. coli strains present in the standard assembled into a single MAG.The greatest number of urine-derived MAGs (n = 4) were identi ed in DNA Microbiome samples while three or fewer MAGs were identi ed in all other extraction methods.The total number of MAGs did not vary by extraction method (Fig. S11, p = 0.3, Friedman); although, fewer contaminant MAGs arose from DNA Microbiome samples as compared to other extraction methods (Fig. S11, overall p = 0.018, Friedman, no pairwise signi cant).
Next, we compared the microbial taxonomic pro les generated by 16S rRNA sequencing, shotgun metagenomic sequencing (MetaPhlAn4), and genome-resolved metagenomics (MAGs) (Fig. 7).Each method is fundamentally different and employs different reference databases for taxonomy assignment.However, all ve urine-derived MAGs also appeared in the top twenty most abundant taxa in the shotgun metagenomics and 16S datasets.Notably, Arcanobacterium is not present in the MetaPhlAn4 reference database, but was identi ed in the shotgun metagenomic data through the SingleM reference database (Fig S9).Additional top 20 genera common between the metagenomics and 16S datasets include: Peptacetobacter/Peptoclostridium spp.and Blautia spp.
Finally, we compared our capture of the ZymoBIOMICs Gut Microbiome Standard community across extraction, sequencing, and bioinformatic methods (Fig S12).The Standard contained 21 microbial taxa including 18 bacterial strains, 1 Archaea, and 2 microbial eukaryotes at differing and biologically relevant abundances.Amongst the bacterial strains, there were 5 closely related strains of E. coli.In the 16S rRNA dataset, we were able to detect a total of 12/21 taxa, all of which were present at ≥0.1% abundance in the Standard.Expectedly, we did not detect the 2 microbial eukaryotes (which do not encode a 16S rRNA gene).We were also unable to differentiate the 5 E. coli strains in the Standard as this is not feasible with amplicon sequencing.We also did not detect the 4 taxa found at ≤0.01% abundance in the Standard (Methanobrevibacter smithii, Salmonella enterica, Enterococcus faecalis, Clostridium perfringens).In the shotgun metagenomic data pro led using MetaPhlAn4, we detected a total of 14/21 taxa in the Standard including the 2 microbial eukaryotes.As with 16S rRNA sequencing, we were able to detect all taxa present at ≥0.1% abundance in the Standard and not able to detect the 4 taxa found at ≤0.01% abundance in the Standard.MetaPhlan4 did not distinguish the 5 E. coli strains.We were further able to assemble a total of 11 MAGs from the shotgun metagenomic data.This included all taxa at ≥1.5% abundance, excluding the eukaryote Candida albicans, which was found at 1.5% abundance but for which we were not able to assemble a MAG.We assembled a single E. coli MAG (rather than the expected 5 unique E. coli strains).The threshold we employed for MAG dereplication (99% ANI) did not allow us to distinguish between the 5 E. coli strains; therefore, as with our 16S rRNA data, we only detected "one" E. coli taxon.A higher ANI (99.9%) and a tool other than dRep would be required for strain differentiation.We were not able to assemble a MAG for M. smithii which was present at 0.1% abundance and detected in 16S rRNA and shotgun metagenomic sequencing.Across methods (16S rRNA, shotgun metagenomics, MAGs), samples extracted using Bacteremia and DNA Microbiome most closely matched the expected microbial taxonomic composition of the Standard (Fig S12).

Functional Pro ling of Urine Microbes
Relatively few studies have performed shotgun metagenomics in urine, and even fewer have generated MAGs [26], which has limited our understanding of the functional potential of the urobiome.In this study, as proof-of-concept, we mined the urine-derived MAGs for key functions.We rst identi ed core metabolic pathways (e.g., glycolysis, citrate cycle) across all MAGs (Fig. S13A).Then we identi ed pathways associated with carbohydrate, nitrogen, acid, and alcohol metabolism.Speci cally, we observed urea utilization in 2 of the MAGs: Staphyocuccus pseudintermedius and Bacillus_A cerus.(Fig. S13B).
Next, we looked for microbial metabolic pathways associated with environmental chemical metabolism.There are a number of environmental chemicals (e.g., arsenic, polycyclic aromatic hydrocarbons) that have been linked to urinary tract diseases like bladder cancer [63].The kidney lters many of these toxicants out of the blood and into the urine.Therefore, it is important to understand if and how urine microbes metabolize these chemicals and how that could impact disease risk.As such, we mined the urine MAGs for pathways associated with polycyclic aromatic hydrocarbon (PAH) and long-chain alkane degradation.PAHs and long-chain alkanes are common environmental pollutants produced during the combustion process and found in vehicle exhaust and industrial output [65-67].We did not identify genes (> 80% gather cutoff) associated with PAH degradation but we did identify genes for long chain alkane utilization: ladB (91% of noise cutoff) in Bacillus_A cereus and ladA alpha (97% of trusted cutoff) in Staphylococcus pseudintermedius.Moreover, in B. cereus, we identi ed a full metabolic pathway starting with an alkanesulfonate monooxygenase (ssuD) that desulfonates organosulfonates to yield sul te and an aldehyde (Fig. 8A).The presence of this pathway supports the possibility that B. cereus may be capable of utilizing a variety of hydrocarbons as potential carbon sources or electron donors.In S. pseudintermedius, we did not identify a complete metabolic pathway for long-chain alkane degradation, but the presence of alcohol and aldehyde dehydrogenase protein families suggest that long chain alkanes activated by ladA may be further oxidized by this organism (Fig. 8B).Taken together, these results suggest that urine-derived microbes can metabolize environmental chemicals, and that microbial metabolism merits further investigation in relation to urinary tract disease risk.

Discussion
Studies of the urobiome are poised to reveal key insights in urinary tract health and disease; however, validation of approaches to pro ling the urine microbial community are urgently needed.Here, we tested urine sampling volume and DNA extraction methods with host depletion using urine from healthy dogs.We identi ed a minimum urine volume threshold for for 16S rRNA and shotgun metagenomic sequencing, and we report on best host depletion methods for obtaining representative and reproducible microbial pro les.Finally, we demonstrate that MAG assembly is feasible in low-microbial, high-host biomass urine samples, and that even in this limited study, we were able to gain novel functional insights into urine-associated microbes.
In relation to urine volume, we observed that greater urine volumes (≥ 3 mL) resulted in improved microbial community capture, increased read depth (although not signi cant), reduced stochasticity / variability between samples, and reduced contaminant abundance (Fig. 1, 2, S5, S6).The largest urine volume tested in this study was 5 mL.It is possible that urine volumes > 5 mL may further increase recovery of rare taxa, though previous work has suggested that urine sample volume does not necessarily in uence total biomass or sequencing depth [18].Notably, one recent review anecdotally recommended 30 mL-50 mL of catheter-collected urine for 16S rRNA gene pro ling [68].Our study focused on mid-stream free catch urine, which can include microbes from the urethra or skin in addition to the bladder, and would therefore contain a higher microbial biomass than catheter-collected samples [9], which would be more representative of the bladder microbiota alone.Thus, it is reasonable to suggest that greater urine volumes would be advisable for urobiome studies that utilize cathetercollected urine; although, further study is warranted.
We next assessed the impact of DNA extraction methods with and without host depletion on multiple sample types (unspiked and host-spiked urine) and sequencing platforms (16S rRNA, shotgun metagenomics).In unspiked (low host biomass) urine, Bacteremia (no host depletion) and DNA Microbiome (host depletion) consistently yielded the greatest DNA concentrations and highest microbial diversity (16S rRNA) (Fig. 3, 4).Additionally, DNA Microbiome and Bacteremia-extracted samples were the most similar compositionally, and both of these methods most accurately captured the taxa and abundances of the ZymoBIOMICs Gut Microbiome Standard (Fig. S12).Notably, we were only able to reliably capture taxa that were found at ≥0.1% abundance in 16S rRNA and shotgun metagenomic data and generate MAGs from taxa found at ≥1.5% abundance.As observed in other studies, interindividual variation (between dogs) generally outweighed differences due to extraction method [18,34].However, when we employed phylogeny-aware metrics (Unweighted UniFrac), we saw signi cant differences in microbial composition by extraction method and by dog, suggesting that some host depletion methods bias microbial community pro les through preferential lysis of speci c bacterial clades.Importantly, Bacteremia and DNA Microbiome have been identi ed as accurate and effective DNA extraction methods in other high-host, low-microbial biomass substrates (i.e., nasal swabs, vaginal swabs, urine, biopsies) [18,[69][70][71].
In host-spiked (high host biomass) urine samples, DNA Microbiome, Zymo and Molzym yielded the greatest percentage of microbial reads (22,70, and 71% respectively) (shotgun metagenomics, Fig. 5).DNA Microbiome also recovered the greatest microbial diversity (MetaPhlAn4, Fig. 6).Notably, Bacteremia, with no host depletion, was not effective in capturing the microbial community in high host biomass urine.As in our 16S rRNA gene analysis, interindividual variation (between dogs) overwhelmed differences by extraction method, though we did not assess the MetaPhlAn4-pro led communities according to phylogenetic differences in composition.The Zymo HostZERO kit did not perform as well in this study as it has in studies on other substrates (respiratory, intestinal biopsy), suggesting that certain host depletion strategies may be differentially effective by substrate [21,69].Other technologies, not tested in this study, may also prove effective at microbial enrichment, including adaptive sequencing [72] and selective mechanical lysis [22].
Important insights have been revealed via read-level analysis of shotgun-sequenced urobiota.For example, in one study, shifts in microbial functional potential were observed in longitudinally collected urine samples of individuals with and without urinary tract symptoms. 4In another study, microbial virulence factor genes were linked to a distinct subset of individuals with urinary tract infections. 25A third study compared the urobiome of healthy individuals to calcium oxalate stone formers, and reported reduced abundances of genes associated with oxalate metabolism in the stone formers, suggesting that the urobiota may play a key role in urinary stone disease pathogeneis [26].Whole-genome sequencing of cultured urine isolates has also revealed key insights: For example, genes enriched in strains of E. faecalis isolated from urine were not found in gut or blood isolates, suggesting unique adaptations for the urinary tract niche [24].MAG generation offers advantages over read-level analyses and culture as it uniquely provides highresolution information on speci c microbes and their potential functions, without a dependence on culture [73].Thus, we attempted de novo assembly of MAGs from our urine samples as proof-of-concept for genome-resolved metagenomics in urine.We assembled a total of ve high quality (> 90% complete, < 10% contaminated), urine-derived microbial genomes: B. cereus, S. pseudintermedius, S. canis, and two unassigned Arcanobacterium spp.Notably, this study focused on mid-stream free catch urine samples which includes microbes from the bladder, urethra, and skin.Additionally, this study only included a small number of healthy individuals and was not designed to capture the breadth of urobiome diversity.To our knowledge, this is among the rst reports of MAGs assembled from urine [26].The MAGs we assembled have all been identi ed as members of the urobiota (or as potential uropathogens) in other studies [2,34,74].
Although the overall number of MAGs we recovered was low, we note that DNA Microbiome yielded a greater number of urine-derived MAGs and generally fewer contaminant MAGs as compared to all other extraction methods.Importantly, the fact that we were able to assemble 11 MAGs from contaminants (i.e.microbial DNA present in reagents and identi able in negative control samples) highlights the need for rigorous negative controls as well as thorough bioinformatic decontamination to avoid spurious results.Well validated tools such as decontam [10,36] as well as an awareness of common "kit-ome" taxa [56] are critical for microbiome studies of low biomass substrates.
After assembling MAGs, we went on to identify key functions in each MAG including core carbon and nitrogen metabolic pathways, urea metabolism, and environmental chemical degradation.We identi ed full urea-degrading complexes (ureABCEFGD) in 2 MAGs (B.cereus and S. pseudintermedius) in 3 the 7 dogs (Fig. S13).As urea is a major component of urine [75], the ability to metabolize urea may be a valuable function / adaptation for urine-associated microbes.As for environmental chemical degradation, there are well-established links between environmental chemical exposures and urinary tract diseases like bladder cancer [63,76].In fact, a recent meta-analysis reported that bacteria associated with PAH degradation were found at increased abundances in the urine of individuals with bladder cancer [2].While we did not nd evidence for microbial PAH degradation in this limited study on healthy dogs, we did nd evidence for long-chain alkane degradation in 2 urine-derived MAGs (B.cereus and S. pseudintermedius) found in 3 of the 7 dogs.Long-chain alkanes are common environmental pollutants that result from industrial combustion processes [65, 77] and can be found in urine [78,79].Our ndings novelly demonstrate that 2 urine-derived MAGs may degrade long-chain alkanes.This proofof-concept study highlights the importance of understanding if and how host-associated microbes may be metabolizing environmental chemicals, so that we can then examine the potential impacts of this metabolism on host health or in diseases like bladder cancer [80-82].

Conclusions
Key takeaways from this study: 1. Urine sample volumes of ³ 3 mL produced the most consistent urobiome pro les in dogs, which are a robust model for the human urobiome.
2. Microbial taxa found at ³ 0.1% abundance were reliably detected via 16S rRNA gene and shotgun metagenomic sequencing, but MAG assembly was only feasible at greater abundances (³ 1.5%), and strain differentiation in metagenomic data may require a higher ANI threshold than employed in this study (99% ANI was used in this study).
3. Generally, interindividual differences in urobiome pro les overwhelmed differences due to DNA extraction method.
4. In urine samples with low host biomass (unspiked), the QIAamp BiOstic Bacteremia kit (with no host depletion) yielded the greatest microbial DNA concentrations and highest microbial diversity (e.g.captured more / rarer urine taxa).
5. In urine samples with high host biomass (host-spiked), the QIAamp DNA Microbiome kit yielded the greatest microbial DNA concentrations, highest microbial diversity, and greatest number of identi ed metagenome-assembled genomes (MAGs), while effectively depleting host DNA.
. MAG assembly is feasible but limited in urine samples.Maximizing urine volume to increase microbial reads would likely improve MAG recovery.Gene-based queries to assess functional potential of the urobiome are feasible with shotgun metagenomic data in the absence of MAG assembly; although, linking function (genes) to microbial species is more challenging with this approach.
7. Urine derived MAGs revealed evidence of urea and environmental chemical (long chain alkane) degradation, both of which are relevant for understanding how microbes live and adapt to the urine environment, as well as how they can potentially modulate environmental exposures in a way that could impact host health.
Urobiome research trails the study of other host-associated microbiomes [8], and continued optimization of urobiome pro ling is critical to enable the mechanistic and functional insights necessary for understanding how these microbes impact host health.S6).B) Microbial richness, or the number of unique amplicon sequence variants (ASVs), increased signi cantly with increased sample volume (p=0.015,Friedman) and 5.0mL samples had signi cantly greater numbers of unique ASVs compared to 0.5mL (p=0.031),0.2mL (p=0.031), and 0.1mL samples (p=0.048),(multiple comparisons were FDR-corrected at 0.05, Table S7).C) Sequencing depth (reads) was increased at greater urine sample volumes although this difference was not signi cant (p=0.075,Friedman).Box and whisker plots show the median, IQR, and min/max.S8).
Whiskers represent minimum, maximum, and median.*p<0.05.C) Microbial composition (Bray-Curtis)  as well as the reconstructed relevant pathway.For the depiction of the regulon, only up to ten neighboring genes on each side were included, and the coloring denotes arbitrary groupings with the gene responsible for alkane activation the darkest (i.e., ssuDand ladB), and genes that weren't directly related to the predicted alkane metabolism colored grey.The numbers below the arrows indicate the gene number on the contig.For the depiction of the reconstructed alkane metabolism pathways, colors denote the number of genes that may be involved at each reaction, noting that for simplicity betaoxidation has been summarized in one ellipse broken into ve pieces.Further description of these results in Supplementary Information 1.

Supplementary Files
This is a list of supplementary les associated with this preprint.Click to download. LewisZJ.et.al.Supplementary.V3.pdf

Figure 1
Abbreviations

Figure 3 Total
Figure 3

Figure 7 Top
Figure 7